Dealing with missing predictor values when applying clinical prediction models.

نویسندگان

  • Kristel J M Janssen
  • Yvonne Vergouwe
  • A Rogier T Donders
  • Frank E Harrell
  • Qingxia Chen
  • Diederick E Grobbee
  • Karel G M Moons
چکیده

BACKGROUND Prediction models combine patient characteristics and test results to predict the presence of a disease or the occurrence of an event in the future. In the event that test results (predictor) are unavailable, a strategy is needed to help users applying a prediction model to deal with such missing values. We evaluated 6 strategies to deal with missing values. METHODS We developed and validated (in 1295 and 532 primary care patients, respectively) a prediction model to predict the risk of deep venous thrombosis. In an application set (259 patients), we mimicked 3 situations in which (1) an important predictor (D-dimer test), (2) a weaker predictor (difference in calf circumference), and (3) both predictors simultaneously were missing. The 6 strategies to deal with missing values were (1) ignoring the predictor, (2) overall mean imputation, (3) subgroup mean imputation, (4) multiple imputation, (5) applying a submodel including only the observed predictors as derived from the development set, or (6) the "one-step-sweep" method. We compared the model's discriminative ability (expressed by the ROC area) with the true ROC area (no missing values) and the model's estimated calibration slope and intercept with the ideal values of 1 and 0, respectively. RESULTS Ignoring the predictor led to the worst and multiple imputation to the best discrimination. Multiple imputation led to calibration intercepts closest to the true value. The effect of the strategies on the slope differed between the 3 scenarios. CONCLUSIONS Multiple imputation is preferred if a predictor value is missing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptation of Clinical Prediction Models for Application in Local Settings

BACKGROUND When planning to use a validated prediction model in new patients, adequate performance is not guaranteed. For example, changes in clinical practice over time or a different case mix than the original validation population may result in inaccurate risk predictions. OBJECTIVE To demonstrate how clinical information can direct updating a prediction model and development of a strategy...

متن کامل

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

Handling Missing Values when Applying Classication Models

Much work has studied the e¤ect of di¤erent treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper …rst compares several di¤erent methods— predictive value imputation, the distribution-based imputation used by C4.5, and using reduced models— for applying classi…cation trees to instances with m...

متن کامل

Handling Missing Values when Applying Classification Models

Much work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods—predictive value imputation, the distributionbased imputation used by C4.5, and using reduced models—for applying classification trees to instances with...

متن کامل

Development and validation of a prediction model with missing predictor data: a practical approach.

OBJECTIVE To illustrate the sequence of steps needed to develop and validate a clinical prediction model, when missing predictor values have been multiply imputed. STUDY DESIGN AND SETTING We used data from consecutive primary care patients suspected of deep venous thrombosis (DVT) to develop and validate a diagnostic model for the presence of DVT. Missing values were imputed 10 times with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Clinical chemistry

دوره 55 5  شماره 

صفحات  -

تاریخ انتشار 2009